Gaijin: A Bootstrapping, Template- Driven Approach to Example-Based MT
نویسندگان
چکیده
Example-based Machine Translation (EBMT) is a recent approach to MT that offers robustness, scalability and graceful degradation, deriving as it does its competence not from explicit linguistic models of source and target languages, but from the wealth of bilingual corpora that are now available. Gaijin is such a system, employing statistical methods, string-matching, case-based reasoning and template-matching to provide a linguistics-lite EBMT solution. The only linguistics employed by Gaijin is a psycholinguistic constraint—the marker hypothesis—that is minimal, simple to apply, and arguably universal. The scope and current state of Gaijin is described, and some initial evaluation results are reported.
منابع مشابه
Iterative, MT-based Sentence Alignment of Parallel Texts
Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts. However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system. We describe a bootstrapping approach to sentence alignment that resolves thi...
متن کاملA survey of Data Driven Machine Translation
Machine Translation (MT) refers to the use of computers for translating automatically from one language to another. The differences between source and target languages and the inherent ambiguity of the source language itself make MT a very difficult problem. Traditional approaches to MT have relied on humans giving linguistic knowledge in the form of rules to transform text. Given the vastness ...
متن کاملCoping with Data-sparsity in Example-based Machine Translation
Data-driven Machine Translation (MT) systems have been found to require large amounts of data to function well. However, obtaining parallel texts for many languages is time-consuming, expensive and difficult. This thesis aims at improving translation quality for languages that have limited resources by making use of the available data more efficiently. Templates or generalizations of sentence-p...
متن کاملHybrid Strategies for better products and shorter time-to-market
The main Lingenio MT products are based on rule-based architectures. In the presentation we show how knowledge from corpora is integrated into the systems using the language analysisand translation-components in a bootstrapping approach. This relates to the bilingual dictionaries, but also to learning decisions concerning the selection of syntactic rules and semantic readings in parsing and sem...
متن کاملElectrophoretic Synthesis of Titanium Oxide Nanotubes
In the current research project, sol-gel electrophoresis technique was utilized to grow titanium dioxide (TiO2) nanotubes. A titanium sol was prepared using organometallic precursors of titanium to fill the template channels. The prepared solwas driven into nanopores of porous anodic aluminum oxide templates under the influence of a DC electric field to form nanotubes on the pore walls. Tube fo...
متن کامل